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A General Method for Estimating a Linear Structural 



Equation System 
Karl G. J^reshog 
Educational Testing Service 

Abstract 

A general method for estimating the unknown coefficients in a set 
of linear structural eq’i.ations is described. In its Tucst general form the 
method allows for both errors in equations (residuals^ distuu’bances ) and 
errors in variables (errors of measurement^ observational errors) and yields 
estimates of the residual variance -covariance matrix and the measurement error 
variances as well estimates of the unknown coefficients in the structural 
equations^ provided all these parameters are identified. Two spec:al 
cases of this general method are discussed separately. One is when there 
are errors in equations but no ei^rors in variables. The other is when there 
are errors in variables but no errors in equations. The methods are applied 
and illustrated using arcificial, econoinie and psychological data. 
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A General Method for Estimating a Linear Structural 
Equation System* 

1> Introduction 

We shall describe a general me'^'hod Tor estimating the unknown coefficients 
in a set of linear structural eouations* In its most general form the method 
will allow for both errors in equa^ Ions (residuals^ disturbances) and errors 
in variables (errors of measureLitnt; observational errors) and vnll yield 
estimates of the residual variance-covariance matrix and the measurement error 
variances as well as estimates of the unknown coefficients In the structural 
equt*cions, provided all these parameters are Identified, After giving 
the resuD-ts for this general caoe, two special cases will be considered. 

The first is the case when there are errors in equations but no errors in 
variables. This case has been studied extensively by econometricians (sec 
e.g., Goldberger; 1964, Chapter 7). The second case is when there are 
errors in variables but no errors in equations* Models of this kind have 
been studied under the name of path analysis by biometricians (see e*g*, 

Turner & Stevens, 1959)* sociologists (see e.g*, Blalock, 1964) and psycholo- 
gists (Werts Sc Linn, 1970)* 

It is assumed that the obsei'ved variables have a multi normal distribu- 
tion and the unknown parameters are estimated by the maximum likelihood method. 
The estimates are computed number ically using a riodification of the Fletcher- 
Povell minimization algorithm (Fletcher Sc Povell, 1965; Gruvaeus & jBreskog, 
1970). Standard eirors of the estimated parameters r,jay be obtained by 
computing the inverse of the information matrix* A coinputer program^ 

*This research has been supported in part by grant NSF-GB-12959 from the 
National Science Foundation. The author wishes to thank Professor Arthur 
Goldberger for his comjrjents on an earlier draft of the paper and Marielle 
van Thillo, who wrote the computer prograiijs, checked the mathematical 
derivations and gave other valuable assistance throughout the work. 
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LISPiEL, in iX)RTRAiI IV, that performs all the necessary computations has been 
VI itten and tested out on the IBM 5^/65; a wite-up of this is under 
preparation (j 8 reskog & van Thillo, 1970). 

In the first special case referred to above, where there are no errors 
of measurement in the observed variables the general method to be presented 
is equivalent to the full information maximurr likelihood (FIML) Hiethod of 
Koopmans, Rubin and Leipnik (l950) also called full information least 
generalized residual varianc e (FILGRV) met nod (Goldberger, 1964, Chapter 7), 
provided that no constraints are imposed on the residual ''variance -covariance 
matrix and the variance-covarian.ee matrix of the independent variables. 
However, with the general method described here, it is possible to assign 
fixed values to some elements of these rriatrices and also to have equality 
constraints among the remai ning elements. 

2. Tne General Model 

Consider random vector. s 3 ' = ( , Ho; • • • > n„) and ) 

Of true dependent and independent variables, respectively, and the following 
system of linear structural relations 



= n ^ s (1) 

where B(m x m) and r(.n x n) are coefficient matrices and =■ ( 

is a random vector of residuals (errors in equations, random disturbance 
terms). Without loss of generality it may be assumed that 
and G( ’,) = 0 . It is furthermore assumed that ^ is uncorrelatcd with 
I and that B is nonsingular. 
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The vectors r\ and i are not observed but instead vectors y 



(yi • • • jY ) and x' ' (x, ,x^. , . . .X ) are observed, such that 

y = Mi + n + e (2) 

X = V + 4 + 6 (5) 



where |i = f’(y) , v - ^(>^) and c and 5 are vectors of errors of measui'e- 
ment in y and x ^ respectively. It is convenient to refer to y and x 
as the observed variables and tj and 4 as the true variables. The errors 
of measurement are assumed to be uncorrelated vith the true variates and 
among themselves . 

I;et C>(n x n) and )jf(m x rn) be the variance -covariance matrices of 

2 2 

I and ^ , respectively^ 9^ and 9^ the diagonal Hjatrices of error 
variances for y and x , respectively. Then it follows, from the above 
assumptions, that the variaiice-covariance matrix (t^ n) x (rn + n)] of 
z ^ (yS^O* is 




The elements of L aie functions of the elements of P , T , 0 , ^ 

8^ and 0^ . In applications some of these elorr.ents are fixed ar.d equal 
to assigned values. In particular this is so fov elements in P and V > 
but we shall allow for fixed values even in the other matrices. For the 
remiaining nonfixed elements of the six parameter matrices one or more .subs* ■ s 
may have identical but unknown values. Thus pararr.eters in B , f , , 

i , 0^ and 8^ are of three kinds: (i) fixed para ^. eters that t.ave been 
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assigned given values^ (ii) constrained parameters that are but equal 

to one or more other parameters and (iii) free parameters that are unknown and 
not constrained to be equal to any other parameter. 

Before an attempt is made to estimate a model oi this kind^ the identi- 
fication problem must be examined. The identification problem depends on 
the specification of fixed, constrained and free parameters- Under a given 
specification, a o^Wen structure B , F , ^ ^ generates 

one and only one but there may be several structures generating the 
sarTiC F. , If two or more structures generate the same ^ , the structures 
are said to be equivalent. If a parameter has the sam.e value in all equiva- 
lent structure^ the parameter is said to be identified. If all parameters 
of the model are identified, the whole model is said to be identified. When 
a model is identified one can usually find consistent estimates of all its 
parameters. Some rules for investigating the identification problem when 
there are no errors in variables are given by Goldberger (196^^ PP* t'OS-^lB). 

Estimaticp of the General Model 



Let Zqj • • • > be N observations of z ?= (y*,x*)* . Since no 
constraints are imposed on the mean vector (ySl'*)' “the maximum likelihood 
(,^stiriate of this is the usual sample m^ean vector z == (y*;X*)* . Let 



i >: (z 






- z)' 



( 5 ) 



be the usual sample variance -covariance matrix, partitioned as 



S[ (m + n) X (m + n)] s= 



er|c 



S (m X m) S (m X n) 

-yy' ' -yx' 

S (n X m) S (n X n) 

-xy' ' -xx' 



r> 



{() 
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The logarithm of the likelihood fonction^ omitting a function of the 
observations^ is given by 



log L = -| Nllog |s| + tr(sr,'^)] 



( 7 ) 



This is regarded as a function of the independent distinct parameters 
in B , r ^ ^ ^ , 0^ and 0^ and is to be maximized with respect 

to these, taking into account that some elements may be fixed and some 
may be constrained to be equal to some others. Maximiizing log L is equiva- 
lent to minimizing 



F = (H/2)[log Is! + tr(S5:'^)] 



( 8 ) 



Such a minimization problem may be formalized as follows. 

Let X' = ^'2^ ' * * ^ ^ vector of all the elements of B , P , 

, 0^ and 9^ arranged in a prescribed order. 'li ^ F may be 

regarded as a function F(A) of X , A , ...^A ^ which is continuous and has 

1 ^ P 
2 

continuous derivatives and c) of first and second order, 

except where ^ singular. The totality of these derivatives is repre- 

sented by a gradient vector c)F/Sa and a symmetric matrix S . Ilov; 

let some p - q of the X’s be fixed and denote the remaining A's by 

, q < p • The forction F is now considered as a fiuiction G(?t) 
c, q - 

2 

of * * * > • Parivatives and 5 G/bnbn* are obtained from 

2 

bv/dK and c) by omitting rows and columns corresponding to the fixed 

A's. Among . . . , r.^ , let there be some r distinct parameters denoted 

K^, •**,Kr , r < q , so that each is equal to one and only one , 



er|c 
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but possibly 


several rt*s equal the 


same k • Let 


K = 


(F. .) be a matrix of 


order q x r 


with elements 


k. . = 
ij 


1 if rt. = K . 

1 0 


and 


k. . 


- 0 otherwise* 


The function 


F (or G ) is 


iiow a 


function H( k) 


of 




K and we 
y ^ r 



have 



5h/c3k == K*(SG/c^n) 


(9) 


9^h/5is9k’ = K’ (oG/Srtdrt * )K 


(10) 



Thus^ the derivatives of H are simple stuns of the derivatives of G . 

The minimization of H(k) is now a straightforward application of 

the Fletcher -Powell method for which a computer pro£;ram is available 

(Gruvaeus & j6reskog^ 19T0)' This method makes use of a matrix E , which 

is evaluated in each iteration. Initially E is any positive definite 

o 

matrix approximating the inverse of . In subsequent iterations 

E is improved, using the information built up about the function so that 
ultimately E converges to an approxitnation of the inverse of d 
at the miniTTium. If there are i, oiy parameters, the mimbe'*’ of iterations 
may be excessive, but can be considerably decreased by the provision of a 
good initial estimate of E • Such an estimate may be obtained by invertin£- 
the information matrix 

C(d^H/dK^K') = K'2(c)^G/drtSit')K , (U) 

C(c) G/d;rSm’) is obtained from 

« e(aF/d? oF/dX') (12) 



where 



o 
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as described above. When the minimum of H has been found, the inverse 
of the information matrix may be computed again to ciitain standard errors 
of all the parameters in « . A general method for obtaining the elements 
of ) is given in Appendix A2. 

The application of the Pletcher-Povell method requires formulas for 
the derivatives of P with respect to the elements of B , P , ^ 

0^ mnd 0^ . These may be obtained by matrix diffe r cutiabion as shown in 
Appendix Al. VJriting A = b'^ , D = and 




the derivatives are 

dr/oB r. -N(A*a -» .VCl ^ A*n 

^p/or = N(.V0^W + 

dF/S<2> = D ^ D*n + Q D + n ) 

' ^yy' ^yx -xy- ^xx' 

oPM NA»h A 
ap/da^ = 
or/da^ n 



(13) 



(I'U 

(15) 

(16) 

(17) 

(18) 
( 1 ' 5 ) 



la those expressions ve have not taken into account that and i' are 
sjTjr.etric and that 9^ and are diagonal matrices. The off-diagonal 
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zero elements of 0^ and 0^ are treated as fixed parameters and the off- 
diagonal elements of and 4^ as constrained parameters. 

When the maximum likelihood estimates of the parameters have been obtained, 
the goodness of fit of the model may be tested, in large samples, by the 
likelihood ratio technique. Let be the nijll hypothesis of the model 

under the given specifications of fixed, constrained and free parameters. 

The alternative hypothesis may be that ^ is any positive definite 

matrix. 

Under , th^ maximum of log L is (see e.g., Anderson, 1958, Chapter 

5 ), 

log N(lcg Ig! + m + n) 

Under , the maximum of log L is equal to minus the minimum value 
Fq of F . Thus minus 2 times the logarithm of the likelihood ratio 
becomes 

U - - n log is I - N(m + n) . (20) 

2 

If the trodel holds, U is distributed, in large samples, as X with 

d = 2 “ (m ^ n)(m + n ^ l) - r (2l) 

degrees of fj^eedotn, where, as before, r is the total number of independent 
parameters estimated under ♦ 
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4. The Special Case of No Errors of Measurement 



If there are no errors of measureTnent in y and x , the model (l) 
may be written 

By = rx + u (22) 

where we have written u instead of ^ . In (22) we have altered the model 
slightly^ compared to (l)^ (2) and (^), in thal: the mean vectors have been 
eliminated. This is no limitation, however, since constant term.s in the 
equations can be handled by using an x -variable that has the value 1 for 
every observation. In, this case, of course, S should be the raw m.oTTient 
matrix instead of the dispersion matrix. 

This type of model has been studied for many years by econom.etricians 
under the names of causal chains and interdependent systems (e.g., V7old St 
Jureen, 1955)* The variables y and x are economic variables and in 
the econometric terminology, the variables are classified as exogenous 
and endogenous variables, the idea beign that the exogenous variables 
are given from the outside and the endogenous variables are accounted for 
by the miodel. From a statistical point of vi w the distinction is ratlier 
between the independent or predetermined variables x and the dependent 
variables y • The residual u represents a raiidom disturbance term assumed 
to be uncorrelated with the predetermined variables. Observations y^ and 
on y and x are usually in the form of a time series. 

Equation (J?2) is usually referred to as the structural form of the 
model. When (22) is premultiplied by B ^ one obtains the reduced form 
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y = IIx + u* f (25) 

where II = and u* - B . u* is the vector of residuals in the 

reduced form. 

In this case, 0^ and 0^ in (U) are zero and therefore \t.\ and 
Z ^ in (7) can be written explicitly. It is readily verified that 









Using these results, log L becomes 



log L = -i Ntlog |4>| + tr (S «>'^)] - } H|log Ul - log |b| 



XX- 



tr[(BS B' - BS r» - re B' + re r' )'!''“]! 

~~yy~ -~yx- — xy~ ~~xx~ ~ ' 



If ^ is unconstrained, maxitfiizing log L with respect to <t> gives 

/N 

^ , which is to be expected, since in this case is the variance- 

covariance matrix of x . After the likelihood has been maximized with 
respect to , the reduced likelihood is equal to a constant plus 



log L» = -^ N{log li'l - log |b| 



^ trt(BS B' - BS r' - re B' + re 

~~yy' ~~yx-. ---xy- ~ 



-yy' 



'yx 



xy~ 



(2:.) 
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If also f is unconstrained^ further simplification can be obtained, 
for then (2^) is maximized with respect to , for given B and P , 
when ^ is equal to 



= BS B» 



Bs • rs B* + rs r> 



so that the functio; to be maximized with respect to B and P becomes a 
constant plus 



log L** = -|N[log U1 - log |b!^] 

= -| N log(UI/lB|^) 

= -i N log !b'V'"^I 

= N log lf-1 , (26) 

where 



= s - s n' - jis + ns n» 

~yy 






In deriving (26), we started from the likelihood function (7} based on 
the assumption of multi normality of y and x . Suv'h an assumption may be 
very 'inrealistic in most economic applications, iioopmans, Pubin and Leipnik 
(1950) derived (24) and (26) from the assumption of multinormal residuals, 
u , which is probably a better assumption. However, the criteri '•n (26) 
has intuitive appeal regardless of distributional assumptions and con- 
nections with the maximum likelihood method. The matrix ^ in (r.^) is the 
variance-covariance matrix of the residuals u in che structural foi'm (22) 
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and the Matrix in (27) is the variance -covariance matrix of the residuals 

u* in the reduced form (2j). Maximizing (26) is equivalent to minimizing 
\'^\ . Since Ml is a generalized variance, this method has been called 
the full information least generalized residual variance (FILGRV) method 
(see, e.g., Goldberger, 1964, Chapter T)« Several other estimation criteria 
based on ^ have been proposed. Brown (196O) suggested the minimization 
of tr(t*) and Zellner (1962) proposed the minimization of tr(W ^^ ) 
where W is proportional to ^yy,x ~ ?yy ^ ?yx?xx-xy * Malinvaud (l9^o, 
Chapter 9) considered the family of estimation criteria tr(A4f*) ‘uith 
arbitrary positive definite weighting matrices A . 

Since the original article by Koopmans, Rubin and Leipnik (1950 ) several 
authors have contributed to the development of the FILGRV method (Cla^^rnoff 
& Divinsky, 1955 ; Klein, 1955 , 19 ^ 9 ; Brown, 1959 ; Eisenpress, 1962 ; Eisenpress 
& Greenstadt, 1964 ; Chow, 1968 ; Wegge, 19^9) • This paper will add another 
computational algorithm to those already existing. 

Minimizing Ml is equivalent to minimizing 

F = log - log 1 b|^ . (^ 8) 

Matrix derivatives of F with respect to 3 and P may be obtained by 
mat 'ix differentiation as shown in Appendix A5. Th^e results are 



c)F/aB r 




f — \ 

1 

i 


(£9) 


?iF/dr = 


2-»''^(FS 

- --XX 


- BS ) 
---yx' 


(;o) 



The function F is to be minimized with respect to the elements of 
B and P taking intc> accoont that some elements are fixed and others are 
constrained in com.e way. As will be demionst^'ated in sections 9 and 6, 



o 
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allowing for equalities among the elements of B and F , is not sufficient 
to handle some economic applications. Instead^ more general constraints may 
be involved. Usually these constraints are linear but even models with 
nonlinear constraints have been studied (sec, e.g., Klein, 1969 )* Such 
constraints can be handled as follows- 

Let rt* = ( jT^ , jT^, . . ., rt ) be the vector of all nonfixed elements in 
B and F . Each of these elements may be a known linear oi' nonlinear 
function of k* “ ^r^ ' parameters to be estimated^ i.e., 

= f^ (»?) , i — 1,2, ...,q • ( 51 ) 

Then F is regarded as a function H(^^) of ■ The derivatives 

of H of first and second order are again given by (9) and (lO)^ but now 
K is the matrix of order q x r whose ijth element is bf./bK. • The 
function H(k) may be minimized by the Fletcher-Povell method as before. 

The adi'^antago of this method compared to the more general one of the 
preceding section is that the function now contains many fewer parameters 
and the minimization is therefore faster* The Fletcher-Powell algorithm 
is relatively easy to apply even in the nonlinear case and the iterations 
converge quadratically from an arbitrary starting point to a minimum of 
the f’unction, although there is no guarantee that this is the absolute 
minimum if several local minimja exist. 

Analysis of Art l' ^icial I^t a 

The following hypothetical economic iiodel is taken from T rovn (1959).* 

C ^ H a^V 4 a^n 4 (> 2 a) 

o 
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W = bo biY + bgY_^ + Ug (52b) 

W + n + T = Y (52c) 

g 

C + E = Y (52d) 



where the dependent variables are 
C = consumer expenditures 
W = wage-salary bill 
H = nonv 3ige income 

Y = total income^ prC'ductiou and expenditure 

and the predetermined variables are 

T = government net revenue 
6 

E = all nonconsumer spending on newly produced final goods 

Y ^ = value of Y lagged one time period 

and where u^ and u^ are random disturbance terms assumed to be uncor- 
related with the predetermined variables. This hypothetical model will be 
used to illustrate some of the ideas and methods of the previous se tions. 

To begin with ve shall assume that the variables involved in this model 
are not directly observed. Instead they are assumed to represent true vari- 
ables that can only be measured with errors* Such an assumption may not bn 
unreasonable, as pointed out by Johnston ( 1963 ): 

To be realistic we must recognize that most economic statistics 
contain errors of measurement, so that they are only approximations 
to the underlying ’’true” values • Such errors may arise because 
totals are estimated on a sample basis or, even if a complete 
enumeration is attempted, errors and inaccuracies may creep in. 

Often, ton, the published statistics may represent en attempt to 
measure concepts which are different from those postulated in the 
theory (p. l48). 
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Converting the variables to deviations from mean values and writing 
T)' = (C,W,n,Y) , V - (T^,E,Y_^) and V = , model (52) may 

be written in the form of (l) as 




n - 




£ + ^ 



(35) 



There are 19 independent parameters in this models namely ^ in B and f , 
6 in 



0 = 



T E 



Y ^BY 
g 



-1 



(34) 



3 in 



= 



UiU^ 



0 



2 

0 

0 0 
0 0 



(55) 



and 6 in 0c = di .^(e ) and ?^ = diag(e^, 6^., 3j|, ) • I'Ote th»t 

g -1 , . 

since (52c) and (52d) f.re error-free equations, | has the form (;j 5) with 



Also since Y ^ is Y 



zero variances and covariances for and 

lagged, ve have assumed that the error variances in Y and Y_^ are ^he 
Therefore, 0^ and 9^ have only 6 independent elements. 



Data were generated from this model by assigning the following values 



to each of the I9 parameters 



^1 ~ 


Bg = 0.4 


bi = 0.5 bg = 0.2 


2 

= 1.0 
g 


4 - 


2 

a; = 5.0 

-1 


E = 0 -^ 
g 




0.2 


= 0.2 
'"1 


4 , ■ “-5 


a =0.1 


0^ = 0-** 
g 


»E - 0 -^ 


0v = 0.5 
-1 


e, = 0.5 


9 „ - 0.6 


0JI = 0y = 0.5 



A 




Ihe resulting ^ , obtained from (4) and rounded to 3 decimals^ is 





c 


W 


n 


Y 


T 


E Y 


c 


4.599 








g 


-X 


VJ 


2.481 


2.069 










n 


4.659 


2.159 


7 . 5 l>t 








Y 


6.449 


5.751 


7 . 40 C) 


10.799 






T 

g 


-0.692 


-0.158 


-1.454 


-0.592 


1.160 




E 


2.100 


1.250 


2.750 


4.100 


0.100 


2.360 




0.442 


0.765 


- 0.421 


0.542 


0.200 


0.100 5.250 



(36) 



( 37 ) 



for the purpose of illustrating the estimation method of section the 

above matrix is regarded as a sample dispersion matrix S to be analyzed. 

The order of the vector A is 76, since there are 78 elements in B ^ P , 

^ ^ ^ ^ together* Of these. 54 are fixed and 24 are 

ncnfixed, sc that n is of order 24. Because of the symmetry of C and V 

ar.d the imposed equality of Oy and Oy , there are I9 independent param- 

^ -1 

eters, so that the order of k is 19* 
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The minimization of H(k) started at the point 



= 0.6 , = 0.5 , = 0.4 , - 0.1 




g 



2.0 



(J = (J 

T E T Y 
g g 



a. 



EY 



= 0.0 



g 



-1 



2 



= 0*5 , = 0.5 , O' 



= 0.0 



cr 





6 ^ = 0.4 



= 0.6 , - 0.5 



-1 



g 



; 0^ - 0.6 ^ = 0.9 , - 0,^ 



From this point seven steepest descent iterations were performed. There- 
after Fletcher -Powell iterations were used and it took 25 such iterations 
to roach a point where all derivatives were less than O.OOOO5 in absolute 
value. At this point, the solution was correct to four decimals and 
the 5 (57) was reproduced exactly. Tw^enty -three Fletcher -Powell 

iterations required for convergence is not considered excessive since no 
infornation about second-order derivatives was used and it takes at least 
19 Fletcher -Powell iterations to build vp an estimate of the matrix of 
second order derivatives. 

We now consider model ( 52 a-d) in the case when the variables are 
observed without errors of meacuremerit. Then the method of section 5 
cannot be applied directly since the two identities (52c) and (52d) imply 
that E is singular. Therefore, two of the endogenous variables must be 
eliminated from the system. It seems most convenient to eliminate C and 
Y . When these variables have been eliminated, the structural equations 



become 
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( 58 ) 



This system may be estimated by the method of section 4. 

To illustrate the application of the estimation procedure we use a 
dispersion matrix S obtained from T- in (57) by subtracting the error 
variances from the diagonal elements and deleting rows and columns corres- 
ponding to C and Y . There are 6 nonfixed elements in B and P , 
namely , p^^ , p^^ , and 7^5 ' These are the elements 

of the vector it • These elements are functicas of a. , a. , b. and 

^ 1 4. 1 

defined by [compare equation (5l)] 



Thu^ the function F is a function of 4 independent parameters. 

The function F vas minimized using only Fletcher-P'^ .ell iterations 
starting from the point 





= 0.6 = 05 b^ “ 0.4 = 0.1 



The solution point, found after 8 iterations, vas, as expected, == O.O , 
a^ - 0.4 , - 0.3 , b^ - 0.2 with 
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6. An Economic Application 



In this section ve apply methods GFLGRV and RI'LGHV to a small economic 
model taken from the literatire- The model is Klein's miodel of United 
States economy presented in Klein (1950^ pp. 5^-66): 



Consumption: 


C r. 


Rq + a^P 4 4 


a,.W + u, 
^ 1 


(i40a) 


Investment : 


I - 


^0 ^ ^ ^2^-1 ^ 


^-.1 " ^2 


(40b) 


Private vages: 


: 


= <=0 ^ ^ '=2^-1 


4 CjA 4 


(40c) 


Product: 


Y 4 


T = C 4 I 4 G 




(40d) 


IncoiuC : 


Y = 


P ^ W 




(l;Oe) 


Capital : 


K = 


" I 




(40f) 


Wages ; 


W = 


M* 4 W*» 




(40g) 


Private product; 


E - 


y 4 T - v.4^* , 




(40hy 



viiere the endogenous variables are 
C = consumption 
I = investment 

= private wage bill 
P - profits 
y = national income 
K end-of~year capital stock 
W - total vage bill 
K - private 
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and the predetermined variables are the lagged endogenous variables , 

K ^ and E ^ and the exogenous variables 
1 = unity 

_ government wage bill 
T = indirect taxes 
G - government expenditures 
A = time in years from 19^1. 

All variabJes except 1 and A arc in billions of 195^ dollars. 

This model contains eight dependent variables and eight predetermined 
variables. There are three e:iuationF jiving residual terms. The other 
five equations are identities. Using the five identities (40d) - (^h)^ 

P ^ Y , K , V cLnd E may be solved for and substituted into (40a) ~ 
(40c). This gives a model vith tJie following structui'al form 




There are 24 nonfixed eirmenis in B aiid T . 'ihesc are all li..ear 
functions of the 12 unknown ec efficients in (JiC^a-c) as follows 

o 
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-1 


0 


0 


0 


0 
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0 
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0 


0 


0 


0 


*0 


0 


0 


0 


’'13 




0 


-1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


’'14 


- 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


'^16 




0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


’'21 




0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


’'22! 




0 


0 


0 


0 


0 


-1 


0 


0 


0 


0 


0 


0 


’'23 




0 


0 


0 


0 


0 


-1 


0 


0 


0 


0 


0 


0 


’'24 




0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


’'26 




0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


l ^. r . 




0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 






























’'31 




0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 j 


’'32 




0 


0 


0 


0 


0 


0 


0 


0 


0 


-1 


0 


0 1 


’'34 




0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 i 
1 


^35 




0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1' 
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rv 
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1 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

c 

0 



From annual obsei’vations^ United States, 1921-19'^! the follcving rav; 
mor.ert matrices arc obtained; 



C I 

C /62166.6J 

S = I ( 16791 O 1 286.02 

\ 

W076.Y8 1217*92 






?8560.86y 




. (42) 
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G 

A 

P 



^-1 



-xy 




1 


/ 1155.90 


26.60 


763.60 




/ 5977.53 


103.80 


4044.07 


T 


/ 7858.86 


160.40 


5315.62 


G 1 


11633.68 


243.19 


7922.46 


A 


577.70 


- 105.60 


460.90 




18929.37 


655.33 


12871.73 


^-li 


227767.38 


5073.25 


153470.56 




1 66815.25 


1831.13 


45288 . 51 / 






A 



626.87 

789.27 

1200,19 

258,00 

1746,22 



1054.95 

1546. n 
176.00 
2548.46 



2569.94 

421.70 

5451.86 



770.00 

- 11.90 



-1 



'-1 \ 



5956.29 



4210.40 21685.18 28766.25 42026.14 590.60 69075.54 846152.70 

\ 1217.70 6564.45 8456.55 12475.50 495.60 20542.22 244984.77 72200 . 05 / 



The following estimated model was obtained 

C =: 18.518 - O. 229 P + 0.564P ^ -t 0.802W 
I = 27.273 - O. 797 P + I. 05 IP 



with 






-1 



'.■!* = 5.766 0.255E + 0.234s ^ -t 0.254 a + 



. /45.7T5 \ 

V* =( 80.456 265.856 1 

V 9.834 80.247 57 . 540 / 



1 



0.i48k_^ + 



( 45 ) 



(44) 
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The standard errors of the estimated parameters may be obtained froiri a 
form.ila for the asymptotic variance- covariance matrix developed by Rothenberg 
and Leenders (1964). 



7. The Special Case of No Residuals 

When there are no residuals in (l)^ the relations betveen r\ and | 
are exact. The joint distribution of r\ and i is singular and of rank n . 
In the equation (4) for 2L , the second term in vanishes. In general^ 

vhen there are fixed and constrained elements in B and F or in C> ^ 0^ 
•\nd 0 , this miodel has to be estimated by the method of section 5* This 
may be done by choosing = 0 and specifying the fixed elements and the 
constraints as described in that section. 

The matrix B can also be written 

? = + 0^ , (^5) 

where 




from which it is seen that the mode?, is identical to a certain restricted 
factor analysis model. Several special cases will now be considered. 

If B = I and r is onconstrained, i.e., all elements of P are 
regarded as free parameters, model (^5) is formally equivalent to an un- 
restricted factor model (Jtlreskog, 19*59) • The matrix A in (46) may be 
obtained f. 3 tn an^ of order (m n) x n satisfying 
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L := 



+ 



(^ 7 ) 



by a transformation of to a reference variables solution vhere the x’s 

are used as reference variables* Maximum likelihood estimates of and Q 

may be obtained by the method of J^reskog (l967a^b) vhich also yields a large 
2 

sample X test of goodness of fit. Let the estimate of A* be partitioned 
as 



h* = 



where A* is of order m x n and 7^ of order n x n . Then the maximum 
likelihood estimates of T and ^ are 





('^ 9 ) 




(50) 



If E = I and V is constrained to have some fixed elements while the 
remaining elements in P are free parameters^ model (45 ) is formally equiva- 
lent to a restricted factor model in the sense of J5reskog ( 1969 )* This nodel 
may be estimated by the procedure described in the same paper and^ in large 
samples, standard errors of the estimiates and a goodness of fit test can also 
be obtained. A computer program, for this procedure is available (j5reskog 
Gruvaeus; I 967 ). 

A more general cane is when B is lover triangular* The structural 
equation system for the true variates is then a causal chain. In general 
such a causal chain rr.ay he estirr.ated by the method described in section 5 
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of the paper, though there may be simpler methods. One example occurs when 
the system is normalized by fixing one elemient n each row of r to UTiity 
and B has the form 



B - 




0 

^22 



0 " 
0 



m2 



where all the 0 *s are free parameters. Then there is a one“to-one trans- 
formation between the free parameters of B and the free elements of 
A = . One may therefore estimate A instead of B . In this case, 

the variance-covariance rriatrix Z is of the form 






(51) 



where 









( 52 ) 



Model (51) is a special case of a general model for covariance structuies 
developed by Joreskog (I97O) and may be estimated using the computer program 
ACOVS (Jcireskog, Gruvaeus & van Thillo, I97O). In this model T , J , 0 ^^ 

and 0 ^ may contain fixed parameters arid even parameters constrained to be 
equal in greups* The computer program gives maximum likelihood estir:^,ates of 
the free parameters in A , V , ^ ^ 3^^ and 0 ^ and, iri large sar.ples, 

standard errors of these estimates and a test of overall gcodnoss of fit cf 
the model can also be obtained. 
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More generally, the above mentioned method may be used whenever ^ 
can be written in the form {^l) such that there is a one-to-one correspondence 
betvreen the free parameters in B and P and the distinct free elemients in 
and A . Por a less trivial example, see Joreskog (1^70, section 2.6). 

3_; A Psychological Application 



In thi.3 section we consider a sim.plified inodel for the prediction of 
achievements in mathematics (m) and science (s) at different grade levels. 
To estimate the model we make use of Iv^ngitudinal data from a growth study 
conducted at Educational Testing Service (Anderson & Maier, 1963; Hilton, 
1969). In this study a nationwide sample of fifth graders was tested in 
1961 and then again in 1963^ 1965 aad 196? as seventh^ ninth and eleventh 
graders, respectively. The test scores emploved in this model are the 
verbal (v) and quantitative (Q) parts of SCAT (Scholastic Aptitude 
Test) obtained in I961 and the achievement tests in mathematics (MyI'l^,M^, 
Mi^) and science (S^, S^, S^ , S^ ^) obtained in I96I, 1963, I963? and 1967, 
respectively. The achieverr.t at tests have been scaled so that the unit of 
measurement is approximately the same at all grade levels. 

The model is depicted in Figure 1, where V , Q , , 



‘"'ll ' ^5 ' ^7 ^ ^9 ^11 scores of t]*ie tests and 

corresponding residuals. The model for the true scores 



is 



. a^V + a^Q ^ 



5 ^ = b^V 4 bgQ ^ 



(mb) 



o 
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h'-^5 * 






(55c) 




diS^ . 






(55d) 






s 




(55e) 






■P M + 
2 9 




(53f) 


Mii = 




^7 




(53g) 


= 






^ ^8 


(55h) 



This rnodel postulates the major influences of a student’s achievement in 
matheoiatics and science at various grade levels. At grade 5 the main 
deteroiinants of a student’s achieveriients are his verbal and quantitative 
abilities at that stage. At higher grade levels, however, the achievements 
are mainly determined by his achievement‘s ir. the earlier grades. Tlius, 
achievements in mathematics in grade i iS determined mainly by the 
achievements in mathematics in grade i - 2 , vlioreas achievements in sci- 
ence in grade i is det-ern.ined mainly by the acthievements in science in 
grade i - 2 and in mathematics in grad? i , i = 7;9^il • 

The structural form of this model is 




11 
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It is seen that this model is a causal chain. The model can be estimated by 
the method described in section 5^ provided some assumption is made about the 
intercoi relations of residuals * Without such an assumption 

the model is not identified. We have chosen to make the assumption that 
all residuals are uncorrelated except and • This assumption does 

not seem to be too unrealistic. 

The data that ve use consist of a raridom sajnple of 750 boys taken from 
all the boys that took all tests at all occasions. The variance -covariance 
matrices are 



yy 



7 



I 










^7 


4J9 




T.r 

‘ll 


130.690 














115.645 


179.617 












116.162 


123.833 


193.537 










90.709 


ll4 .564 


120.426 


l48. --48 








119 . 56)+ 


125.22:; 


155 ,883 


120.492 


215.894 






104.430 


135.074 


137.827 


135.231 


159.783 


2 18. C 67 




119.712 


126.470 


149.930 


112.218 


175.497 


149.045 


26)1.071 


90.916 


116.950 


117.439 


109.187 


133.839 


147.115 


143.218 



11 
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, 5 

97. 5>)^ 

76.587 



122.919 106.837 
82.389 87.859 



96.252 108,748 

65.703 91-*502 



107.750 

72.534 



“11 

107.042 

89.617 



94.613 
64 .455 



V 

V /138.OI4 
Q \ 75.518 



Q 

f?0.751 



The estir.ated rodel is 



= 0.640V + 0.415Q 4 



(55'0 




: 1.296V - 0.175Q ^ I2 

= 1.09T:-!5 ^ 



(551 ) 

(55c) 



30 



-i 9 - 

= 0.5253^ + 0.495M^ + 

Mg = 1 . 02 m^ + 

Sg = 0.70?S^ + 0.585Mg + tg 
= 0 . 951 Mg + 

= 0.658Sg + 0 .i84m^^ + 

The estimated variance-covariance matrix 



( 55 d) 

( 55 e) 

( 55 f) 

( 55 g) 

( 55 h) 

the true scores V and Q is 



V Q 

V /105.1+8 \ 

~ " Q \ 75.95 76 . 68 / 

Estimated residual variances end error variances for each measure are given 
below 



Measure Residual Variance Error Variance 



V 


-- 


55.1 


Q 


-- 


4 . 1 * 


M5 


10.0 


25.4 


S5 


22.5 


11.8 


( 


26.1* 


40.5 


29.5 


24.5 


'■'9 


25.2 


29.5 


^9 


28.5 


56.1 




75.7 


18.8 


^11 


20.0 


47.7 
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The estimated correlation between and 0 * 17 * 

The estimated reduced form for the true scores is 





0.640V + 0.419Q + q 


( 56 a) 


^5 = 


1 . 296 V - 0.175 Q + 


( 56 b) 


M? " 


0.702V + 0.455Q + 1 * 


( 560 ) 




0 . 767 V + 0 . 167 Q + 


(56d) 




O. 72 IV + 0.467Q + 1 * 


( 56 e) 




0 . 815 V + O. 296 Q + 


(56f) 


'•’ll ^ 


= 0.686V + 0.444Q + 1 * 


(56g) 


^11 


= 0.665V + 0.277Q + eg 


(56h) 



The relative variance contributions of V and Q , the residual and 

the error, to each test's total variance are shown below: 



Measure 



M 

S 



11 

11 



and Q 


Residual 


P'rror 


0.75 


0.03 


0.19 


0.78 


0.15 


0.07 


0.59 


0.20 


0.21 


0.56 


0.28 


0.16 


C.56 


0.50 


0 . 1^1 


0.52 


0.52 


0.16 


0.42 


0.51 


0.07 


0.42 


0.55 


0.25 
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It is not easy to give a clear-cut interpretation oi these results. 
Inspecting first the equations (55c), (55^) and (55S>, is seen that a 
unit increase in I-h ^ tends to have a smaller effect on the larger 

i is. This agrees with the fact that the growth curves in mathematics 
■'flattens'' out at the higher grade levels. One v;ould ey.pect that the co- 
efficient in (55g)^ lii^e in (5pc) and. in (55c). would be 

greater than one^ since, in general, for these data, the correlation of status , 
2 > gain , - Ih 2 > positive although usually very small. 

However, the laa'ge residual variance suggests that alone is root 

sufficient to account for . This is prooably due to the fact that 

Tiathematics c^.^urses at the higher grades change character fron being 
mainly "arithemetic computation'' to involving ip.ore "algebraic reasoning. 

Inspecting next the equations (55d), (55f) (55^0 describing 

science achievements, it is seen that the influence of matheir.atics on 
science tends to decrease at the higher grades. This is natural since 
science courses in the lover grades are based mainly on "logical reasoning 
whereas in the higher grades they are based on "r.em.or Izing of facts." 'ihc 
effect of science aohi ever.ents on science tv;o years later fl/si increases 
and tiien decreases. This is probably because the science courses special- 
ize into different courses (biology, ITiysics, etc.) at grade 11 vixercas 
the science test at the lover voas'.ircs soi.c kind of ovci-.ll '’science 

knowledge. " 

V.T-iatever r.ay be the best intorvvetatior s of these results, tiic c-ya- .plc 
reives t'i illu.strate tiiat it is jcssiblc to have both mors in c-T-rVvions 
and cvror.s in variables ar.d still have an cstir.able tr.cdel • 

o 

ERIC 



33 



-32- 



References 

Anderson, T. W. An introduction to multivariate statistical aiialysis_ . New 
York: Wilay, 195^* 

Ai^derson, S. & Maie.’, M. H. 34,000 pupils and how they grew. Journal 
o f Teacher Education , 1963, 212-216. 

Blalock, H. M. Causal inferenc e s in nonexperirriental research . Chapel Hill, 
N. C.; University of North Carolina Press, 1964. 

Brown, T. M. Simplified full maximum likelihood and comparative structural 
estimates. Econometrica , 1959, 65^^-633* 

Brown, T. M. Simultaneous least squares: a distribution free method c 

equation system structure estimation. International Economic Kev ‘ 

i960, 1, 175 “3 9t. 

Chernoff, H., & Divinsky, N. The computation of maximum-likelihood es^'i 
of linear structural equations. In W. C. Hoed £c T» C. Koopmans ( 
Studies in econometric me t hod , Cowles Corrunission !«!onograph 14. IV' 
Wiley, 1953. Pp. 236-269. 

0}:ow, G. C. Tv>o metnods of computing full -informat ion meiximum likeli'- c 
estimates in srnultaneous stoci:astic equations. International Kc ' 
Review , 196-3, 2^ 100-112. 

Eisenpress, H. Note on the corqritation of full-information maximiim-li k- .( 

estimates of coeffic:ents of a simultaneous system. Econometrio n . 

30, 343-348. 

Eisenpress, H.. fit Greenstalt, J. The estimation of noj’i -liner^' econo:'<. 1 
syster.s. Fcor.o:-etr ica , I966, 3^ 831-?6l. 




34 



Fletcher^ R.^ & Powell^ M. J. D. A rapidly convergent descent r.ethod for 
miniTaization, The Computer Journal , 1963^ ^ l63-l63. 

Go Idberger, A. S. Econometric theory . New York: Wiley^ 1964. 

Gruvaeus^ G,^ & Jbreskog^ K. G. A computer program for minimizing a function 
of several variables. Research Bulletin 70 -l4. Pi^inceton^ N. J<: 
Educational Testing Sei’vice^ 1970. 

Hilton^ T. L. Growth study annotated bibliography, Pi^ogress Report 69'11 • 
p 2 ‘inceton, N, J,: Educational Testing Service, 1969* 

Johnston, J. Econometric methods . New York: McGraw-Hill, 1963* 

Jbreskog, K, G, Some contributions to maximum likelihood factor analysis, 
Psychometrik a, 196?^ 39, 443-482, (a) 

Jbreskog, K, G, IH^ILFA-^A computer program for unrestricted maximum likelihood 

factoi' analysis. Research fiemorandum 66-20, Princeton, IJ. J.: Educational 

Testing Service, revised edition, 1967* (b) 

j8reskog, K* G. A general approach to confinnatory maximum ].ikelihood factor 
analysis, Psychometrika, 1969^ 34^ 183-202, 

J^reskogr K, G, A general method for analysis of covariance structures, 
Biometrika , 197O, pj , 239-291, 

J5reskog, K, G., & Gruveeus, G, M!T,FA--A computer program for restricted 

maximum likelihood factor analysis. Research t'ertorandum 67-21, Princeton: 
N, J,: Educational Testing Service. 1967* 

Jfireskog, K, G,, Gruvaous, G. T,, & van Thillo, M, A00VS--A genei al computer 
program for analysis of covariance structia-cs, KesearcJi Bulletin 70-13* 
Pi'incetcn, N. J.; Educational Testtrig Service, 

o 




- 5 ^- 



Jiireskog^ K» G., & van Thillo^ M. LISREL--A general computer program for 
estimating linear structural relationships. Research Bulletin 70-00. 
Pr'inceton^ N. J • : Educational Testing Service^ in preparation* 

Klein^ L. R. Economic fluctuations in the United States, 1921-1Q41, Co^vles 
ComiTiission Monograph 11. New York: V/iley^ 1950. 

Klein, L. R. A textbook of econometrics . Evanston: Rov, Peterson, 1953. 
Klein, L. R. Estimation of interdependent systems in macroeconometrics . 
Econometri ca, 19^9; 57; 171-192. 

Koopmans, T. C., Rubin, H., & Leipnik, K. B. Measuring the equation systems 
of dynamic economics. In T. C. Koopmans (Ed.), Statistical inference 
in dynamic economic models . Covies Coirimission Monograph 10. Nev York: 
Wiley, 1950. Pp. 55-257- 

Malinvaud, E. Statistical methods of econometrics . Chicago: Rand-JIcIJall^- , 

1966. 

Kothenberg, T. G., fic Leenders, C. T. Efficient estimation of simultaneous 
equation stems. Econometrica, 1964, 52, 5'f'f6. 

Turner, M* E., Stevens, C. B. The I'egression analysis of causal paths. 
Biometrics , 19 59 > ^ 5 5 6 - 2 58 . 

V'egge, L. L. A family of functional iterations and tl:ie solution of r.axir.um 
likelihood estimation equations. Econonetrica , I96:?) 192-150. 

Verts, C. E,, & Einn, H. L. Path analysis; Psychological examples. 

Psychological Bulletin . 1970, Ik (3), 193-212. 

Void, H., Jureen, L, Demand analysis . Eev York: Viley, 1953. 

Zcllner, A. An efficient method of estimating seemirgly unrelated 
regi'cssions and tests for aggregation bMts , Journal of the 
/vnerican otatisiical Association , 1969, 348-368. 







3B 



- 55 - 



A. Appendi c es oT Mathematical Derivat ions 



Al. Matrix Derivatives of Function F in Section j 



The function is 



F = log I El + tr(SS’^) 



(Al) 



v.^hich is regarded as a function of B ^ P , ^ ^ ^ 8^ defined by 

To derive the matrix derivatives ve shall make use of matrix dif- 
ferentials. In general^ dX - (dx. ,) will denote a matrix of differentials 

1 J 



ana 



i if F is a function of X ^ad dF = tr(CdX’) then Sf/Sx - C 



V/riting A - 13 ^ and D = B ^P = AP we have 



oA = = -AdBA 



(A2) 



dD = B dr + dAr 



Adr - AdBAP 



= Adr - /dBD 



(ajO 



Furl}]' x’j'/j ’e, since in general, 
■ilor 'x! = tr(X'^dX) 



and 



dcr(AX'h tr(A^LX'-^) 



- 1 > 






= -tr(x'bx'hx) 
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we obtain from (Al) 



dF = dlog 1^1 ^ dtr(S^ 

= ti'Or^dS) - tr(X‘^3£'^dr.) 
=- tr[(5'^ - S*^s>:'^)d^] 

= tr(fid5^) 



= tr(fi dZ + dE f n dE + n dE ) 

~yy ~yy ~y^ -xy ~xy ~y>; -xx -xx' ’ 


(Ah) 


where is defined by (l2) and is 


partitioned the same 


v;ay as H in 


(12). 






From (4 ) and the definitions of A 


and D we have 




K DM)^ + AC^A^ - 

-yy 




(A5) 


Z ^ ^ OD* 

- >;y "-yx 




(A6) 


p 

Z =. t: -» :< 

'XX " 




(AY) 


from Vr'hich v:e obtain 






dF D:dD» ^ MID' + dDlD* 






> A',dA» 4 Adl'A* -5- dA v A» 






■+ 2.'^ d ) 




(A6) 


d?: = ':dn» ^ d:i)» 

-xy - - 




(Ay) 


dF =. dt * 20 d.. 

^xx - ^o 




(/;o) 



o 

ERIC 



o7- 



Substitution of dA and dD from (A2) and (A3) intD (a 8) and (A9) 

gives 

dZ = M'dr»A’ - D>5^D»dB’A» 

-yy ~ ^ 

^ AdI>:^D» - AdBDOD' 

- Ai;^A'dB»A’ - AdRA^A* 

f DdOD* + AdyA* -f ^ (All) 

dR - C’dP'A* - CD*dB'A* + d^D* (A12) 

-xy ^ ^ ^ 

Substitution of (All)^ (A12) and (AxO) into (A4)^ not:.ng ^.nat- tr(c‘dX) 

^ tr(dX*C) - tr(CdX’) and collecting terms^ sbovs that tht matrices multiplying 
dB* ^ dr* ^ d^ , d7 , ai'id d3^ are the r.atric 3 cn the right sices of 

equations (l4), (13), (iT), (l^) (19) respectively. T1 ese are therefcre 

the corresponding matrix derivatives. 

Am. [iiforr.atioii yatx’ix for tliC General liodel of Section 3 



In this section ve shall prove a general theorem concern ing the exj ectrd 
second -order derivatives of any function of the ty]:c (8) and s\iov hcv: tins 
theorem can be applirui to compute all the olem.ents of tl;e informatioii i.alriv 

(m). 



L<-T:v,a: 



first prove the following 

Let S = Cl/:.') - j)(z^ - 5)’ , vhere Zg, • • . . ?;j 

i ndc } rndently distributrd according to K(m.^ 0 • Tim-n t)ie asyri totic 
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distribution of thn elements of Q = T. - s)S ^ is muitivariaTn 



normal with means zero and variances and covariaiioes given by 



ne(m ,0) ) = 
' Q3 uv 



(A 15) 



Proof: The proof follows imm^ediately by ir^ultiplying am ^ -■ Z Z (j ^(cr , - n , 

gn !s;n 



li V ® ^ 

and w ~ T. Z cr^^^fcr, . - s. ,)cT“^ and using the fact that the asymptotic vt 'U- 
uv . . ' 1^) ij 

^ J 

ances and covariances of S are given by 



- ' 

gj hi 



Nd [(cr , - s , ) ( O'. , - s , . ) ] =■ CT .a + a 

gh gh''' 1,] ij'-' gi hj g 

(see e.g., Anderson, Theoren 4.2.4). 

We can now^ prove the following general theorem. 

Theorem: Under the conditions of the above lemma let the elements of Z be 



functions of tw^o parameter matrices M = (u , ) and U - (v. ,) an.d 
1 gii Ki 

let F(M,N) = |H[log|r-l + tr(SE‘^)] yith oF/o" -- ;;Ar;B and 
of/ oil ~ NCUD . Then we have asymptotically 

(l/lOd(UF/c,i^^5v.^y - (AJ-/U’)gi(B'i;-U)^. 4 (Av-U)^^,yp’>oU').^. • (m 

Pi'oof: VJrlting oF/o[l , - IJa mo ..b^, and of/ov, . - Uc. co d . . whore it 

gil ii:i 0;1 f3h ' 1,1 IM PV V.’ 

is assumed that every ro}'eated subscript is to be suu.med over, \:e have 



(i/'.')e{ehfy 'y,. ov. y - (i/:i)--(‘'F/'ri‘gy’>’/c'v^y 



= IJ ^l(a c . CO d .) 

' 0;1 Dh lU UV Vy 

“la b.,c. d uo ) 

ph ip vy crA nv 

= a , li . c . d . (O' O' +0 O' ) 
(_v fh ..» vy ' 
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=- (AE'^C) .(B'5’‘^D)^. + (AS'S) .(B'S'^C’), . 
' ■'gi'' '^hj ' "^gj' hi 



It should be noted that the theorem is quite general in that both !•' 
and N may be row or colunin vectors or scalars and M and II nay be 
identical in which case, of course, ^ ; C and BED. 

We now show how the abov’ theorem can be applied repeatedly to conpute 
all the eletr.ents of the information miatrix (12). To do so we write the 
derivatives (ih) - (iw) in the form required by the theorem. 

Let A - and D = B'^P , as before, and 

T[m X (ni ♦ n) J -- [A' 0] (Alp) 



P[ (tri + n) X Tti] 



KD' + Af'A'' 
ID' 



Q[ (n 




R( (r. ^ r) X n] - 



Then it is readily verified tliat 




OF/oB -LTnP 



oF/oP = I.T.QQ 



c’F/'-'- " P.'iT.H 

of/o;- = Pin-i' 



(Al6) 



(AlY) 



(AlP) 



(A1-) 
(Abo) 
(AP.^ ) 
(A, -I ) 
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bF/o3 = N,ae 



In the last equation we have combined (l8) and (I 9 ) using 




(A2J0 



A3 ■ Matrix Derivatives of Function F in Section 4 
The function is defined by 

2 

F = log l<r| - log |b1 , 



(ASl) 



where 



= BS B* 

- - -ry- 



BS r' - rs B' 4 rs r' 

~--xy~ ~~xx~ 



(A2^) 



One finds inimcdiately that 

dF = tr(v'^di;') - 2tr(B'^dB) 

= tr[v''^(dl-3 B' + BS dB' - d£S F' - 113 dB')l 
..yy. --yy -- --rx- --x;/ ~ 

- 2tr(B'hB) 

+ trIv’^(-BS dr> - drs ?' ^ are r* + re ar')] 

-- ' --yx ~ "xiv' ~-xx - 

' 2w|tv'-^(!5„ - f5xy> ‘ 

* ■ !5y,hr') , 

SO that the derivatives oF/oP and are those Given uy { y ) and 
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